home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-avt-video-packet-00.txt
< prev
next >
Wrap
Text File
|
1993-03-21
|
29KB
|
896 lines
Internet draft Packetization of H.261
Packetization
of
H.261 video streams
Mon Mar 8, 1993
Expires: October 1993
Thierry Turletti, Christian Huitema
INRIA
Christian.Huitema@sophia.inria.fr
Thierry.Turletti@sophia.inria.fr
1. Status of this Memo
This document is an Internet draft. Internet drafts are
working documents of the Internet Engineering Task Force
(IETF), its Areas, and its Working Groups. Note that other
groups may also distribute working documents as Internet
Drafts).
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted
by other documents at any time. It is not appropriate to use
Internet Drafts as reference material or to cite them other
than as a "working draft" or "work in progress".
Please check the I-D abstract listing contained in each
Internet Draft directory to learn the current status of this
or any other Internet Draft.
Distribution of this document is unlimited.
Turletti, Huitema [Page 1]
Internet draft Packetization of H.261
2. Purpose of this document
The CCITT recommendation H.261 [1] specifies the encodings
used by CCITT compliant video-conference codecs. Although
these encodings were originally specified for fixed data rate
ISDN circuits, experimentations [2] have shown that they can
also be used over the internet.
The purpose of this memo is to specify how H.261 video streams
can be carried over UDP and IP, using the RTP protocol [3].
3. Structure of the packet stream
H.261 codecs produce a bit stream. In fact, H.261 and
companion recommendations specifies several levels of
encoding:
(1) Images are first separated in blocks of 8x8 pixels.
Blocks which have moved are encoded by computing the
discrete cosine transform (DCT) of their coefficients,
which are then quantized and Huffman encoded.
(2) The bits resulting of the Huffman encoding are then
arranged in 512 bits frames, containing 2 bits of
synchronization, 492 bits of data and 18 bits of error
correcting code.
(3) The 512 bits frames are then interlaced with an audio
stream and transmitted over px64 kbps circuits according
to specification H.261.
When transmitting over the Internet, we will directly consider
the output of the Huffman encoding. We will not carry the 512
bits frames, as protection against errors can be obtained by
other means. Similarly, we will not attempt to multiplex audio
and video signals in the same packets, as UDP and RTP provide
a much more efficient way to achieve multiplexing.
Directly transmitting the result of the Huffman encoding over
an unreliable stream of UDP datagrams would however have very
poor error resistance characteristics. The H.261 coding is in
fact organized as a sequence of images, or frames, which are
themselves organized as a set of Groups of Blocks (GOB). Each
GOB holds a set of 3 lines of 11 macro blocs (MB). Each MB
Turletti, Huitema [Page 2]
Internet draft Packetization of H.261
carries information on a group of 16x16 pixels: luminance
information is specified for 4 blocks of 8x8 pixels, while
chrominance information is only given by two 8x8 "red" and
"blue" blocks.
This grouping is used to specify informations at each level of
the hierarchy:
- At the frame level, one specifies informations such as
the delay from the previous frame, the image format, and
various indicators.
- At the GOB level, one specifies the GOB number and the
default quantifier that will be used for the MBs.
- At the MB level, one specifies which blocks are presents
and which did not change, and optionally a quantifier, as
well as precisions on the codings such as distance
vectors.
The result of this structure is that one need to receive the
informations present in the frame header to decode the GOBs,
as well as the informations present in the GOB header to
decode the MBs. Without precautions, this would mean that one
has to receive all the packets that carry an image in order to
properly decode its components. In fact, the experience as
shown that:
(1) It would be unrealistic to carry an image on a single
packet: video images can sometime be very large.
(2) GOB informations would most often be correctly sized to
fit in a packet. In fact, several GOBs can often be
grouped in a packet.
Once we have take the decision to correlate GOB
synchronization and packetization, a number of decisions
remain to be taken, due to the following conditions:
(1) The algorithm should be easy to implement when
packetizing the output stream of an hardware codec.
(2) The algorithm should not induce rendition delays -- we
should not have to wait for a following packet to display
an image.
Turletti, Huitema [Page 3]
Internet draft Packetization of H.261
(3) The algorithm should allow for efficient
resynchronization in case of packet losses.
(4) It should be easy to depacketize the data stream and
direct it to an hardware codec's input.
(5) When the hardware decoder operates at a fixed bit rate,
one should be able to maintain synchronization, e.g. by
adding padding bits when the packet arrival rate is
slower than the bit rate.
The H.261 Huffmans encoding includes a special "GOB start"
pattern, composed of 15 zeroes followed by a single 1, that
cannot be imitated by any other code words. That patterns mark
the separation between two GOBs, and is in fact used as an
indicator that the current GOB is terminated. The encoding
also include a stuffing pattern, composed of seven zeroes
followed by four ones; that stuffing pattern can only be
entered between the encoding of MBs, or just before the GOB
separator.
The first conclusion of the analysis is that the packets
should contain all the GOB data, including the "GOB start"
pattern that separate the current block from its follower. In
fact, as this pattern is well known, we could as well use a
single bit in the data header to indicate it's presence.
Not encoding the GOB-start pattern has two advantages:
(1) It reduces the number of bits in the packets, and avoids
the possibility of splitting packets in the middle of a
GOB separator.
(2) It authorize gateways to hardware decoders to insert the
stuffing pattern in front of the GOB, in order to meet
the fixed bit rate requirement.
Another problem posed by the specificities of the H.261
compression is that the GOB data have no particular reason to
fit in an integer number of octets. The data header will thus
contain two three bits integers, EBIT and SBIT:
SBIT indicates the number of bits that should be ignored in
the first (start) data octet.
Turletti, Huitema [Page 4]
Internet draft Packetization of H.261
EBIT indicates the number of bits that should be ignored in
the last (end) data octet.
Although only the EBIT counter would really be needed for
software coders, the SBIT counter was inserted to ease the
packetization of hardware coders output. An sample
packetization procedure is found in annex A.
At the receiving sites, the GOB synchronization can be used in
conjunction with the synchronization service of the RTP
protocol. In case of losses, the decoders could become
desynchronized. The "S" bit of the RTP header will be set to
indicate that the packet includes the beginning of the
encoding of a GOB, i.e. the quantifier common to all macro
blocks. The receiver will detect losses by looking at the RTP
sequence numbers. In case of losses, it will ignore all
packets whose "S" bit is null. Once an S bit packet has been
received, it will prepend the GOB start code to that packet,
and resume decoding.
A example packetization program is given in Appendix A.
Turletti, Huitema [Page 5]
Internet draft Packetization of H.261
4. Usage of RTP
The H.261 informations are carried as data within the RTP
protocol, using the following informations:
_____________________________________________
| Ver | Protocol version (1). |
|___________|________________________________|
| Flow | Identifies one particular |
| | video stream. |
|___________|________________________________|
| Content | H.261 encoded video (31). |
|___________|________________________________|
| Sequence | Identifies the packet within |
| number | a stream |
|___________|________________________________|
| Sync | Set if the packet is |
| | synchronized on an image or |
| | on a group of blocks. |
|___________|________________________________|
| Timestamp | The date at which the |
| | image was grabbed. |
|___________|________________________________|
The very definition of this settings implies that the
beginning of an image shall always be synchronized with a
packet. The RTP sequence number can be used to detect missing
packets. In this case, one shall ignore all incomings packets
until the next synchronization mark is received. The H.261
data will follow the RTP options, as in:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| flow |F|S| content | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp (seconds) | timestamp (fraction) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. .
. RTP options (optional) .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| H.261 options | H.261 stream... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Turletti, Huitema [Page 6]
Internet draft Packetization of H.261
The H.261 options field is defined as following:
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S|SBIT |E|EBIT |C|I|V|0| FMT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
_______________________________________________________
| S (1 bit) | Start of GOB. Set if |
| | the packet is a start of GOB. |
|_______________|______________________________________|
| SBIT (3 bits) | Start bit position |
| | number of bits that should |
| | be ignored in the first |
| | (start) data octet. |
|_______________|______________________________________|
| E (1 bit) | End of GOB. Set if |
| | the packet is an end of GOB. |
|_______________|______________________________________|
| EBIT (3 bits) | End bit position |
| | number of bits that should |
| | be ignored in the last |
| | (end) data octet. |
|_______________|______________________________________|
| C (1 bit) | Color flag. Set if |
| | color is encoded. |
|_______________|______________________________________|
| I (1 bit) | Full Intra Image flag. |
| | Set if it is the first packet |
| | of a full intra image. |
|_______________|______________________________________|
| V (1 bit) | movement Vector flag. |
| | Set if movement vectors |
| | are encoded. |
|_______________|______________________________________|
| FMT (4 bits) | Image format: |
| | QCIF, CIF or number of CIF in SCIF.|
|_______________|______________________________________|
The image format (4 bits) is defined as following:
Turletti, Huitema [Page 7]
Internet draft Packetization of H.261
_____________________________
| QCIF | 0000|
|____________________|_______|
| CIF | 0001|
|____________________|_______|
| SCIF 0 | |
| upper left corner | 0100|
| CIF in SCIF image | |
|____________________|_______|
| SCIF 1 | |
| upper right corner | 0101|
| CIF in SCIF image | |
|____________________|_______|
| SCIF 2 | |
| lower left corner | 0110|
| CIF in SCIF image | |
|____________________|_______|
| SCIF 3 | |
| lower right corner | 0111|
| CIF in SCIF image | |
|____________________|_______|
Turletti, Huitema [Page 8]
Internet draft Packetization of H.261
5. Usage of RTP parameters
When sending or receiving H.261 streams through the RTP
protocol, the stations should be ready to:
(1) process or ignore all generic RTP parameters,
(2) send or receive H.261 specific "Reverse Application Data"
parameters, to request a video resynchronization.
This memo describes two "RAD" item types, "Full Intra Request"
and "Negative Acknowledge".
5.1. Controlling the reverse flow
Support of the reverse application data by the H.261 sender is
optional; in particular, early experiments have shown that the
usage of this feature could have very negative effects when
the number of recipients is very large.
Recipients learn the return address where RAD informations may
be sent from the Content description (CDESC) item, which may
be included as an RTP option in any of the video packets. The
CDESC item includes a Return port number value. A value of
zero indicates that no reverse control information should be
returned.
A recipient shall never send a RAD item if it has not yet
received a CDESC item from the source, or if the port number
received in the last CDESC item was null.
Emitters should identify themselves by sending CDESC items at
regular intervals.
5.2. Full Intra Request
The "Full Intra Request" items are identified by the item Type
"FIR" (0).
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| RAD | length = 1 | Type | Z | Flow |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Turletti, Huitema [Page 9]
Internet draft Packetization of H.261
These packets indicate that a recipient has lost all video
synchronization, and request the emitter to send the next
image in "Intra" coding mode, i.e. without using differential
coding. The various fields are defined as follow:
________________________________________________
| F | Last option bit, as defined by RTP.|
|________|______________________________________|
| RAD | RAD option type (65) |
|________|______________________________________|
| Length | One 32 bits word. |
|________|______________________________________|
| Type | FIR (0). |
|________|______________________________________|
| Z | Must be zero |
|________|______________________________________|
| Flow | The flow id of the incoming packets|
|________|______________________________________|
5.3. Negative Acknowledge
Packet losses are detected using the RTP sequence number.
After a packet loss, the receiver will resynchronize on the
next GOB. However, as H.261 uses differential encoding, parts
of the images may remain blurred for a rather long time.
As all GOB belonging to a given video image carry the same
time stamp, the receiver can determine a list of GOBs which
were effectively received for that time stamp, and thus
identify the "missing blocks". Requesting a specific
reinitialization of these missing blocks is more efficient
than requesting a complete reinitialization of the image
through the "Full Intra Request" item.
Turletti, Huitema [Page 10]
Internet draft Packetization of H.261
The format of the video-nack option is as follow:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| RAD | length = 3 | Type | Z | Flow |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FGOBL | LGOBL | MBZ | FMT |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp (seconds) | timestamp (fraction) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The different fields have the following values:
________________________________________________________
| F | Last option bit, as defined by RTP. |
|_______________|_______________________________________|
| RAD | RAD option type (65) |
| - | |
| Length | Three 32 bits word. |
|_______________|_______________________________________|
| Type | NACK (1). |
|_______________|_______________________________________|
| MBZ | Must be zero |
|_______________|_______________________________________|
| Flow | The flow id of the incoming packets |
|_______________|_______________________________________|
| FGOBL | First GOB Lost: |
| | Identifies the first GOB lost number.|
|_______________|_______________________________________|
| LGOBL | Last GOB Lost: |
| | Identifies the last GOB lost number.|
|_______________|_______________________________________|
| MBZ | Must be zero |
|_______________|_______________________________________|
| FMT | Repeat the format indicator of the |
| | received image, including the number|
| | of the SCIF subimage if present. |
|_______________|_______________________________________|
| Timestamp | The RTP timestamp of the |
| original image| |
|_______________|_______________________________________|
Turletti, Huitema [Page 11]
Internet draft Packetization of H.261
6. References
[1] Video codec for audiovisual services at p x 64 kbit/s
CCITT Recommendation H.261.
[2] Thierry Turletti. H.261 software codec for
videoconferencing over the Internet INRIA Research Report
no 1834
[3] Henning Schulzrinne A Transport Protocol for Real-Time
Applications INTERNET-DRAFT, December 15, 1992.
Turletti, Huitema [Page 12]
Internet draft Packetization of H.261
Appendix A
The following code can be used to packetize the output of an
H.261 codec:
#include <stdio.h>
#define BUFFER_MAX 512
int right[] = {
8,7,6,6,5,5,5,5,4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
int left[] = {
8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0};
h261_sync(F)
FILE *F;
{
int i, ebit, sbit, start_of_group, end_of_group,
c, nz;
unsigned char buf[BUFFER_MAX];
int *left, *right;
i = 0;
ebit = 0;
sbit = 0;
start_of_group = 1;
nz = 0;
while (c = getc(F)) {
buf[i++] = c;
if (c == 0) {
Turletti, Huitema [Page 13]
Internet draft Packetization of H.261
nz += 8;
} else {
nz += right[c];
end_of_group = 1;
if (nz >= 15) {
if (right[c] == 7) {
ebit = 0;
send_message(buf, i - 2, sbit, ebit,
end_of_group, start_of_group);
sbit = 0;
i = 0;
} else {
ebit = 7 - right[c];
send_message(buf, i - 2, sbit, ebit,
end_of_group, start_of_group);
i = 0;
buf[i++] = c;
sbit = right[c] + 1;
}
start_of_group = 1;
} else {
nz = left[c];
if (i >= BUFFER_MAX) {
ebit = 0;
end_of_group = 0;
send_message(buf, i - 2, sbit, ebit,
end_of_group, start_of_group);
buf[0] = buf[i - 2];
buf[1] = buf[i - 1];
i = 2;
sbit = 0;
start_of_group = 0;
}
}
}
}
}
Turletti, Huitema [Page 14]
Internet draft Packetization of H.261
Table of Contents
1 Status of this Memo ................................... 1
2 Purpose of this document .............................. 2
3 Structure of the packet stream ........................ 2
4 Usage of RTP .......................................... 6
5 Usage of RTP parameters ............................... 9
5.1 Controlling the reverse flow ........................ 9
5.2 Full Intra Request .................................. 9
5.3 Negative Acknowledge ................................ 10
6 References ............................................ 12
Appendix A ............................................. 13
Turletti, Huitema [Page 15]